Goto

Collaborating Authors

 procedural memory


Real-Time Procedural Learning From Experience for AI Agents

Bi, Dasheng, Hu, Yubin, Nasir, Mohammed N.

arXiv.org Artificial Intelligence

Learning how to do things from trial and error in real time is a hallmark of biological intelligence, yet most LLM-based agents lack mechanisms to acquire procedural knowledge after deployment. We propose Procedural Recall for Agents with eXperiences Indexed by State (PRAXIS), a lightweight post-training learning mechanism that stores the consequences of actions and retrieves them by jointly matching environmental and internal states of past episodes to the current state. PRAXIS augments agentic action selection with retrieved state-action-result exemplars that are generated in real time. When evaluated on the REAL web browsing benchmark, PRAXIS improves task completion accuracy, reliability, and cost efficiency across different foundation model backbones, and shows preliminary generalization to unseen tasks in similar environments. These results demonstrate that PRAXIS enables the practical adoption of AI agents in fast-evolving stateful environments by helping them learn new procedures effectively.


LEGOMem: Modular Procedural Memory for Multi-agent LLM Systems for Workflow Automation

Han, Dongge, Couturier, Camille, Diaz, Daniel Madrigal, Zhang, Xuchao, Rühle, Victor, Rajmohan, Saravan

arXiv.org Artificial Intelligence

We introduce LEGOMem, a modular procedural memory framework for multi-agent large language model (LLM) systems in workflow automation. LEGOMem decomposes past task trajectories into reusable memory units and flexibly allocates them across orchestrators and task agents to support planning and execution. To explore the design space of memory in multi-agent systems, we use LEGOMem as a lens and conduct a systematic study of procedural memory in multi-agent systems, examining where memory should be placed, how it should be retrieved, and which agents benefit most. Experiments on the OfficeBench benchmark show that orchestrator memory is critical for effective task decomposition and delegation, while fine-grained agent memory improves execution accuracy. We find that even teams composed of smaller language models can benefit substantially from procedural memory, narrowing the performance gap with stronger agents by leveraging prior execution traces for more accurate planning and tool use. These results position LEGOMem as both a practical framework for memory-augmented agent systems and a research tool for understanding memory design in multi-agent workflow automation.


TokMem: Tokenized Procedural Memory for Large Language Models

Wu, Zijun, Hao, Yongchang, Mou, Lili

arXiv.org Artificial Intelligence

Large language models rely heavily on prompts to specify tasks, recall knowledge and guide reasoning. However, this reliance is inefficient as prompts must be re-read at each step, scale poorly across tasks, and lack mechanisms for modular reuse. We introduce TokMem, a tokenized procedural memory that stores recurring procedures as compact, trainable embeddings. Each memory token encodes both an address to a procedure and a control signal that steers generation, enabling targeted behavior with constant-size overhead. To support continual adaptation, TokMem keeps the backbone model frozen, allowing new procedures to be added without interfering with existing ones. We evaluate TokMem on 1,000 tasks for atomic recall, and on function-calling tasks for compositional recall, where it consistently outperforms retrieval-augmented generation while avoiding repeated context overhead, and fine-tuning with far fewer parameters. These results establish TokMem as a scalable and modular alternative to prompt engineering and fine-tuning, offering an explicit procedural memory for LLMs.


The Memory Paradox: Why Our Brains Need Knowledge in an Age of AI

Oakley, Barbara, Johnston, Michael, Chen, Ken-Zen, Jung, Eulho, Sejnowski, Terrence J.

arXiv.org Artificial Intelligence

In the age of generative AI and ubiquitous digital tools, human cognition faces a structural paradox: as external aids become more capable, internal memory systems risk atrophy. Drawing on neuroscience and cognitive psychology, this paper examines how heavy reliance on AI systems and discovery-based pedagogies may impair the consolidation of declarative and procedural memory -- systems essential for expertise, critical thinking, and long-term retention. We review how tools like ChatGPT and calculators can short-circuit the retrieval, error correction, and schema-building processes necessary for robust neural encoding. Notably, we highlight striking parallels between deep learning phenomena such as "grokking" and the neuroscience of overlearning and intuition. Empirical studies are discussed showing how premature reliance on AI during learning inhibits proceduralization and intuitive mastery. We argue that effective human-AI interaction depends on strong internal models -- biological "schemata" and neural manifolds -- that enable users to evaluate, refine, and guide AI output. The paper concludes with policy implications for education and workforce training in the age of large language models.


A Proposal to Extend the Common Model of Cognition with Metacognition

Laird, John, Lebiere, Christian, Rosenbloom, Paul, Stocco, Andrea

arXiv.org Artificial Intelligence

The Common Model of Cognition (CMC) provides an abstract characterization of the structure and processing required by a cognitive architecture for human-like minds. We propose a unified approach to integrating metacognition within the CMC. We propose that metacog-nition involves reasoning over explicit representations of an agent's cognitive capabilities and processes in working memory. Our proposal exploits the existing cognitive capabilities of the CMC, making minimal extensions in the structure and information available within working memory. We provide examples of metacognition within our proposal.


Procedural Memory Is Not All You Need: Bridging Cognitive Gaps in LLM-Based Agents

Wheeler, Schaun, Jeunen, Olivier

arXiv.org Artificial Intelligence

Large Language Models (LLMs) represent a landmark achievement in Artificial Intelligence (AI), demonstrating unprecedented proficiency in procedural tasks such as text generation, code completion, and conversational coherence. These capabilities stem from their architecture, which mirrors human procedural memory -- the brain's ability to automate repetitive, pattern-driven tasks through practice. However, as LLMs are increasingly deployed in real-world applications, it becomes impossible to ignore their limitations operating in complex, unpredictable environments. This paper argues that LLMs, while transformative, are fundamentally constrained by their reliance on procedural memory. To create agents capable of navigating ``wicked'' learning environments -- where rules shift, feedback is ambiguous, and novelty is the norm -- we must augment LLMs with semantic memory and associative learning systems. By adopting a modular architecture that decouples these cognitive functions, we can bridge the gap between narrow procedural expertise and the adaptive intelligence required for real-world problem-solving.


Human-inspired Perspectives: A Survey on AI Long-term Memory

He, Zihong, Lin, Weizhe, Zheng, Hao, Zhang, Fan, Jones, Matt W., Aitchison, Laurence, Xu, Xuhai, Liu, Miao, Kristensson, Per Ola, Shen, Junxiao

arXiv.org Artificial Intelligence

With the rapid advancement of AI systems, their abilities to store, retrieve, and utilize information over the long term - referred to as long-term memory - have become increasingly significant. These capabilities are crucial for enhancing the performance of AI systems across a wide range of tasks. However, there is currently no comprehensive survey that systematically investigates AI's long-term memory capabilities, formulates a theoretical framework, and inspires the development of next-generation AI long-term memory systems. This paper begins by introducing the mechanisms of human long-term memory, then explores AI long-term memory mechanisms, establishing a mapping between the two. Based on the mapping relationships identified, we extend the current cognitive architectures and propose the Cognitive Architecture of Self-Adaptive Long-term Memory (SALM). SALM provides a theoretical framework for the practice of AI long-term memory and holds potential for guiding the creation of next-generation long-term memory driven AI systems. Finally, we delve into the future directions and application prospects of AI long-term memory.


Learning to Adapt: Bio-Inspired Gait Strategies for Versatile Quadruped Locomotion

Humphreys, Joseph, Zhou, Chengxu

arXiv.org Artificial Intelligence

Deep reinforcement learning (DRL) has revolutionised quadruped robot locomotion, but existing control frameworks struggle to generalise beyond their training-induced observational scope, resulting in limited adaptability. In contrast, animals achieve exceptional adaptability through gait transition strategies, diverse gait utilisation, and seamless adjustment to immediate environmental demands. Inspired by these capabilities, we present a novel DRL framework that incorporates key attributes of animal locomotion: gait transition strategies, pseudo gait procedural memory, and adaptive motion adjustments. This approach enables our framework to achieve unparalleled adaptability, demonstrated through blind zero-shot deployment on complex terrains and recovery from critically unstable states. Our findings offer valuable insights into the biomechanics of animal locomotion, paving the way for robust, adaptable robotic systems.


Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation

Cherepanov, Egor, Kachaev, Nikita, Zholus, Artem, Kovalev, Alexey K., Panov, Aleksandr I.

arXiv.org Artificial Intelligence

The incorporation of memory into agents is essential for numerous tasks within the domain of Reinforcement Learning (RL). In particular, memory is paramount for tasks that require the utilization of past information, adaptation to novel environments, and improved sample efficiency. However, the term "memory" encompasses a wide range of concepts, which, coupled with the lack of a unified methodology for validating an agent's memory, leads to erroneous judgments about agents' memory capabilities and prevents objective comparison with other memory-enhanced agents. This paper aims to streamline the concept of memory in RL by providing practical precise definitions of agent memory types, such as long-term versus short-term memory and declarative versus procedural memory, inspired by cognitive science. Using these definitions, we categorize different classes of agent memory, propose a robust experimental methodology for evaluating the memory capabilities of RL agents, and standardize evaluations. Furthermore, we empirically demonstrate the importance of adhering to the proposed methodology when evaluating different types of agent memory by conducting experiments with different RL agents and what its violation leads to. Reinforcement Learning (RL) effectively addresses various problems within the Markov Decision Process (MDP) framework, where agents make decisions based on immediately available information (Mnih et al., 2015; Badia et al., 2020). However, there are still challenges in applying RL to more complex tasks with partial observability. To successfully address such challenges, it is essential that an agent is able to efficiently store and process the history of its interactions with the environment (Ni et al., 2021). Sequence processing methods originally developed for natural language processing (NLP) can be effectively applied to these tasks because the history of interactions with the environment can be represented as a sequence (Hausknecht & Stone, 2015; Esslinger et al., 2022; Samsami et al., 2024). However, in many tasks, due to the complexity or noisiness of observations, the sparsity of events, the difficulty of designing the reward function, and the long duration of episodes, storing and retrieving important information becomes extremely challenging, and the need for memory mechanisms arises (Graves et al., 2016; Wayne et al., 2018; Goyal et al., 2022).


Bridging Generative Networks with the Common Model of Cognition

West, Robert L., Eckler, Spencer, Conway-Smith, Brendan, Turcas, Nico, Tomkins-Flanagan, Eilene, Kelly, Mary Alexandria

arXiv.org Artificial Intelligence

This article presents a theoretical framework for adapting the Common Model of Cognition to large generative network models within the field of artificial intelligence. This can be accomplished by restructuring modules within the Common Model into shadow production systems that are peripheral to a central production system, which handles higher-level reasoning based on the shadow productions' output. Implementing this novel structure within the Common Model allows for a seamless connection between cognitive architectures and generative neural networks.